- Scrivere una funzione che prenda in input due liste o due array 1-D di numeri reali aventi la stessa
lunghezza e restituisca la loro distanza euclidea. Suggerimento: si usi la vettorizzazione.
import numpy as np
import math
def distance(p1, p2):
if len(p1) != len(p2):
return 'errore'
else:
differenza = p1-p2
quadrato = differenza**2
somma = np.sum(quadrato)
return math.sqrt(somma)
p1 = np.array([1, 2])
p2 = np.array([2, 1])
print(distance(p1, p2))
1.4142135623730951
[<matplotlib.lines.Line2D at 0x11762dd00>]
- Importare il data set e visualizzare il tipo di ogni variabile.
import os
os.getcwd()
'/Users/ludovicavargiu'
os.chdir('/Users/ludovicavargiu/Desktop/Laboratorio Python')
import pandas as pd
cars = pd.read_csv('cars.csv')
cars
| mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car_name | price | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 18.0 | 8 | 307.0 | 130 | 3504 | 12.0 | 70 | USA | chevrolet chevelle malibu | 25561.59078 |
| 1 | 15.0 | 8 | 350.0 | 165 | 3693 | 11.5 | 70 | USA | buick skylark 320 | 24221.42273 |
| 2 | 18.0 | 8 | 318.0 | 150 | 3436 | 11.0 | 70 | USA | plymouth satellite | 27240.84373 |
| 3 | 16.0 | 8 | 304.0 | 150 | 3433 | 12.0 | 70 | USA | amc rebel sst | 33684.96888 |
| 4 | 17.0 | 8 | 302.0 | 140 | 3449 | 10.5 | 70 | USA | ford torino | 20000.00000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 387 | 27.0 | 4 | 140.0 | 86 | 2790 | 15.6 | 82 | USA | ford mustang gl | 13432.50000 |
| 388 | 44.0 | 4 | 97.0 | 52 | 2130 | 24.6 | 82 | Europe | vw pickup | 37000.00000 |
| 389 | 32.0 | 4 | 135.0 | 84 | 2295 | 11.6 | 82 | USA | dodge rampage | 47800.00000 |
| 390 | 28.0 | 4 | 120.0 | 79 | 2625 | 18.6 | 82 | USA | ford ranger | 46000.00000 |
| 391 | 31.0 | 4 | 119.0 | 82 | 2720 | 19.4 | 82 | USA | chevy s-10 | 9000.00000 |
392 rows × 10 columns
cars.dtypes
mpg float64 cylinders int64 displacement float64 horsepower int64 weight int64 acceleration float64 model int64 origin object car_name object price float64 dtype: object
- Selezionare le prime 10 righe del dataset e le colonne ‘mpg’ e ‘acceleration’ utilizzando sia loc che
iloc.
iloc = cars.iloc[0:10, [0, 5]]
iloc
| mpg | acceleration | |
|---|---|---|
| 0 | 18.0 | 12.0 |
| 1 | 15.0 | 11.5 |
| 2 | 18.0 | 11.0 |
| 3 | 16.0 | 12.0 |
| 4 | 17.0 | 10.5 |
| 5 | 15.0 | 10.0 |
| 6 | 14.0 | 9.0 |
| 7 | 14.0 | 8.5 |
| 8 | 14.0 | 10.0 |
| 9 | 15.0 | 8.5 |
loc = cars.loc[0:9, ['mpg', 'acceleration']]
loc
| mpg | acceleration | |
|---|---|---|
| 0 | 18.0 | 12.0 |
| 1 | 15.0 | 11.5 |
| 2 | 18.0 | 11.0 |
| 3 | 16.0 | 12.0 |
| 4 | 17.0 | 10.5 |
| 5 | 15.0 | 10.0 |
| 6 | 14.0 | 9.0 |
| 7 | 14.0 | 8.5 |
| 8 | 14.0 | 10.0 |
| 9 | 15.0 | 8.5 |
cars[(cars['horsepower'] > 150) & (cars['acceleration'] < 12)]
| mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car_name | price | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 15.0 | 8 | 350.0 | 165 | 3693 | 11.5 | 70 | USA | buick skylark 320 | 24221.422730 |
| 5 | 15.0 | 8 | 429.0 | 198 | 4341 | 10.0 | 70 | USA | ford galaxie 500 | 30000.000000 |
| 6 | 14.0 | 8 | 454.0 | 220 | 4354 | 9.0 | 70 | USA | chevrolet impala | 35764.334900 |
| 7 | 14.0 | 8 | 440.0 | 215 | 4312 | 8.5 | 70 | USA | plymouth fury iii | 25899.465570 |
| 8 | 14.0 | 8 | 455.0 | 225 | 4425 | 10.0 | 70 | USA | pontiac catalina | 32882.537140 |
| 9 | 15.0 | 8 | 390.0 | 190 | 3850 | 8.5 | 70 | USA | amc ambassador dpl | 32617.059280 |
| 10 | 15.0 | 8 | 383.0 | 170 | 3563 | 10.0 | 70 | USA | dodge challenger se | 30000.000000 |
| 11 | 14.0 | 8 | 340.0 | 160 | 3609 | 8.0 | 70 | USA | plymouth 'cuda 340 | 33034.922610 |
| 13 | 14.0 | 8 | 455.0 | 225 | 3086 | 10.0 | 70 | USA | buick estate wagon (sw) | 26608.328420 |
| 38 | 14.0 | 8 | 400.0 | 175 | 4464 | 11.5 | 71 | USA | pontiac catalina brougham | 33793.722840 |
| 41 | 12.0 | 8 | 383.0 | 180 | 4955 | 11.5 | 71 | USA | dodge monaco (sw) | 23414.417100 |
| 66 | 11.0 | 8 | 429.0 | 208 | 4633 | 11.0 | 72 | USA | mercury marquis | 16223.268340 |
| 89 | 12.0 | 8 | 429.0 | 198 | 4952 | 11.5 | 73 | USA | mercury marquis brougham | 30000.000000 |
| 93 | 13.0 | 8 | 440.0 | 215 | 4735 | 11.0 | 73 | USA | chrysler new yorker brougham | 40000.000000 |
| 94 | 12.0 | 8 | 455.0 | 225 | 4951 | 11.0 | 73 | USA | buick electra 225 custom | 26260.634730 |
| 95 | 13.0 | 8 | 360.0 | 175 | 3821 | 11.0 | 73 | USA | amc ambassador brougham | 40000.000000 |
| 115 | 16.0 | 8 | 400.0 | 230 | 4278 | 9.5 | 73 | USA | pontiac grand prix | 27104.011820 |
| 123 | 11.0 | 8 | 350.0 | 180 | 3664 | 11.0 | 73 | USA | oldsmobile omega | 9006.648949 |
| 154 | 16.0 | 8 | 400.0 | 170 | 4668 | 11.5 | 75 | USA | pontiac catalina | 40000.000000 |
| 227 | 16.0 | 8 | 400.0 | 180 | 4220 | 11.1 | 77 | USA | pontiac grand prix lj | 16330.963190 |
| 228 | 15.5 | 8 | 350.0 | 170 | 4165 | 11.4 | 77 | USA | chevrolet monte carlo landau | 40000.000000 |
- Quante sono le auto di produzione giapponese? Rappresentare un grafico a barre con il numero di
auto per Paese di produzione.
cars['origin'].value_counts()
origin USA 245 Japan 79 Europe 68 Name: count, dtype: int64
import seaborn as sns
sns.catplot(data = cars, x = 'origin', kind = 'count')
<seaborn.axisgrid.FacetGrid at 0x13b6c3ce0>
- Calcolare il prezzo medio delle auto raggruppando in base a ‘model’.
cars[['price', 'model']].groupby('model').mean()
| price | |
|---|---|
| model | |
| 70 | 27464.198038 |
| 71 | 29766.801306 |
| 72 | 27407.591721 |
| 73 | 31395.407957 |
| 74 | 28714.623571 |
| 75 | 26359.907582 |
| 76 | 29185.822009 |
| 77 | 29012.154454 |
| 78 | 32967.374645 |
| 79 | 29340.843358 |
| 80 | 33269.261384 |
| 81 | 29000.531344 |
| 82 | 30438.724092 |
- Inserire una nuova variabile ‘kml’ nel dataset, ottenuta convertendo ‘mpg’ in km/l, usando il fat-
tore di conversione indicato nella descrizione delle variabili. Rappresentare la distribuzione di ‘kml’ condizionatamente al Paese di origine dell’auto tramite dei boxplot.
cars['kml'] = cars['mpg']*0.425170
cars
| mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car_name | price | kml | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 18.0 | 8 | 307.0 | 130 | 3504 | 12.0 | 70 | USA | chevrolet chevelle malibu | 25561.59078 | 7.65306 |
| 1 | 15.0 | 8 | 350.0 | 165 | 3693 | 11.5 | 70 | USA | buick skylark 320 | 24221.42273 | 6.37755 |
| 2 | 18.0 | 8 | 318.0 | 150 | 3436 | 11.0 | 70 | USA | plymouth satellite | 27240.84373 | 7.65306 |
| 3 | 16.0 | 8 | 304.0 | 150 | 3433 | 12.0 | 70 | USA | amc rebel sst | 33684.96888 | 6.80272 |
| 4 | 17.0 | 8 | 302.0 | 140 | 3449 | 10.5 | 70 | USA | ford torino | 20000.00000 | 7.22789 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 387 | 27.0 | 4 | 140.0 | 86 | 2790 | 15.6 | 82 | USA | ford mustang gl | 13432.50000 | 11.47959 |
| 388 | 44.0 | 4 | 97.0 | 52 | 2130 | 24.6 | 82 | Europe | vw pickup | 37000.00000 | 18.70748 |
| 389 | 32.0 | 4 | 135.0 | 84 | 2295 | 11.6 | 82 | USA | dodge rampage | 47800.00000 | 13.60544 |
| 390 | 28.0 | 4 | 120.0 | 79 | 2625 | 18.6 | 82 | USA | ford ranger | 46000.00000 | 11.90476 |
| 391 | 31.0 | 4 | 119.0 | 82 | 2720 | 19.4 | 82 | USA | chevy s-10 | 9000.00000 | 13.18027 |
392 rows × 11 columns
sns.catplot(data = cars, x = 'origin', y = 'kml', kind = 'box')
<seaborn.axisgrid.FacetGrid at 0x13bc47e90>
- Rappresentare le relazioni tra le variabili quantitative del data set utilizzando pairplot, colorando
i punti in base al numero di cilindri.
sns.pairplot(data = cars, hue = 'cylinders')
<seaborn.axisgrid.PairGrid at 0x1381c12b0>
- Stimare un modello di regressione lineare tra ‘horsepower’ (variabile esplicativa) e ‘weight’ (variabile
risposta), includendo l’intercetta. I coefficienti ottenuti sono coerenti con lo scatterplot ottenuto nel punto precedente?
import statsmodels.api as sm
lm = sm.OLS(cars.weight, sm.add_constant(cars.horsepower))
res = lm.fit()
print(res.summary())
OLS Regression Results
==============================================================================
Dep. Variable: weight R-squared: 0.747
Model: OLS Adj. R-squared: 0.747
Method: Least Squares F-statistic: 1154.
Date: Tue, 21 Jan 2025 Prob (F-statistic): 1.36e-118
Time: 13:03:20 Log-Likelihood: -2929.9
No. Observations: 392 AIC: 5864.
Df Residuals: 390 BIC: 5872.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 984.5003 62.514 15.748 0.000 861.593 1107.408
horsepower 19.0782 0.562 33.972 0.000 17.974 20.182
==============================================================================
Omnibus: 11.785 Durbin-Watson: 0.933
Prob(Omnibus): 0.003 Jarque-Bera (JB): 21.895
Skew: 0.109 Prob(JB): 1.76e-05
Kurtosis: 4.137 Cond. No. 322.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
il coefficiente angolare è positivo, per tanto coerente con lo scatterplot precedente
sns.lmplot(x = 'horsepower', y = 'weight', data = cars, ci = None)
<seaborn.axisgrid.FacetGrid at 0x13f647410>
Si consideri la successione f (n) = (1 + a)n definita per a ≥ −1 e n ≥ 0. Scrivere una funzione con due argomenti, a ed n, che restituisca l’ennesimo termine della successione e stampi un messaggio di errore se i vincoli su a ed n non sono rispettati. Ad esempio, se a = 5 e n = 2, f (2) = (1 + 5)2 = 36, mentre se a =−3 si dovrebbe ottenere un messaggio di errore. Si scriva la funzione sia in forma ricorsiva che in forma non ricorsiva.
def func(a, n):
if a < -1 or n < 0:
return 'errore'
else:
return (1+a)**n
print(func(-3, 4))
print(func(5, 2))
print(func(3, -2))
print(func(-3, -5))
errore 36 errore errore
def func_r(a, n):
if a < -1 or n < 0:
return 'errore'
if n == 0:
return 1
else:
return (1+a) * func_r(a, n-1)
print(func(-3, 4))
print(func(5, 2))
print(func(3, -2))
print(func(-3, -5))
errore 36 errore errore
import os
os.getcwd()
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
import pandas as pd
phd = pd.read_csv('phd.csv')
phd
| articles | gender | married | kids | prestige | mentor | |
|---|---|---|---|---|---|---|
| 0 | 0 | male | yes | 0 | 2.52 | 7 |
| 1 | 0 | female | no | 0 | 2.05 | 6 |
| 2 | 0 | female | no | 0 | 3.75 | 6 |
| 3 | 0 | male | yes | 1 | 1.18 | 3 |
| 4 | 0 | female | no | 0 | 3.75 | 26 |
| ... | ... | ... | ... | ... | ... | ... |
| 910 | 11 | male | yes | 2 | 2.86 | 7 |
| 911 | 12 | male | yes | 1 | 4.29 | 35 |
| 912 | 12 | male | yes | 1 | 1.86 | 5 |
| 913 | 16 | male | yes | 0 | 1.74 | 21 |
| 914 | 19 | male | yes | 0 | 1.86 | 42 |
915 rows × 6 columns
phd.dtypes
articles int64 gender object married object kids int64 prestige float64 mentor int64 dtype: object
- Selezionare dalla quinta alla dodicesima riga del dataset e le colonne ‘gender’ e ‘prestige’ utilizzando
sia loc che iloc.
iloc = phd.iloc[4:12, [1, 4]]
iloc
| gender | prestige | |
|---|---|---|
| 4 | female | 3.750 |
| 5 | female | 3.590 |
| 6 | female | 3.190 |
| 7 | male | 2.960 |
| 8 | male | 4.620 |
| 9 | female | 1.250 |
| 10 | male | 2.960 |
| 11 | female | 0.755 |
loc = phd.loc[4:11, ['gender', 'prestige']]
loc
| gender | prestige | |
|---|---|---|
| 4 | female | 3.750 |
| 5 | female | 3.590 |
| 6 | female | 3.190 |
| 7 | male | 2.960 |
| 8 | male | 4.620 |
| 9 | female | 1.250 |
| 10 | male | 2.960 |
| 11 | female | 0.755 |
- Selezionare le osservazioni con un numero di articoli maggiore di 5 e meno di due figli.
phd[(phd['articles'] > 5) & (phd['kids'] < 2)]
| articles | gender | married | kids | prestige | mentor | |
|---|---|---|---|---|---|---|
| 877 | 6 | male | yes | 1 | 4.62 | 8 |
| 878 | 6 | female | yes | 0 | 2.10 | 36 |
| 880 | 6 | male | yes | 0 | 4.34 | 9 |
| 881 | 6 | female | yes | 0 | 4.29 | 24 |
| 883 | 6 | male | yes | 1 | 2.96 | 13 |
| 884 | 6 | male | no | 0 | 4.29 | 18 |
| 885 | 6 | male | no | 0 | 3.40 | 14 |
| 886 | 6 | female | no | 0 | 4.54 | 12 |
| 887 | 6 | male | yes | 1 | 3.85 | 16 |
| 888 | 6 | female | no | 0 | 3.15 | 9 |
| 889 | 6 | female | no | 0 | 4.54 | 15 |
| 890 | 6 | male | no | 0 | 3.47 | 6 |
| 891 | 6 | female | yes | 0 | 4.29 | 1 |
| 892 | 6 | male | no | 0 | 1.97 | 4 |
| 893 | 6 | female | no | 0 | 3.32 | 6 |
| 894 | 7 | male | yes | 0 | 3.59 | 1 |
| 895 | 7 | male | no | 0 | 2.54 | 6 |
| 896 | 7 | male | no | 0 | 3.41 | 20 |
| 897 | 7 | male | yes | 1 | 1.97 | 0 |
| 898 | 7 | female | no | 0 | 3.15 | 9 |
| 899 | 7 | male | no | 0 | 4.62 | 15 |
| 900 | 7 | male | no | 0 | 4.54 | 42 |
| 901 | 7 | male | yes | 0 | 3.69 | 9 |
| 902 | 7 | male | no | 0 | 4.34 | 19 |
| 903 | 7 | male | no | 0 | 4.29 | 19 |
| 904 | 7 | male | yes | 1 | 3.59 | 27 |
| 905 | 7 | male | no | 0 | 3.69 | 19 |
| 906 | 8 | male | yes | 0 | 2.51 | 11 |
| 907 | 9 | male | yes | 1 | 2.96 | 23 |
| 908 | 9 | male | yes | 1 | 1.86 | 47 |
| 909 | 10 | female | yes | 0 | 3.59 | 18 |
| 911 | 12 | male | yes | 1 | 4.29 | 35 |
| 912 | 12 | male | yes | 1 | 1.86 | 5 |
| 913 | 16 | male | yes | 0 | 1.74 | 21 |
| 914 | 19 | male | yes | 0 | 1.86 | 42 |
- Quanti sono i dottorandi sposati?
phd['married'].value_counts()
married yes 606 no 309 Name: count, dtype: int64
- Qual è il numero medio di articoli pubblicati condizionatamente al numero di figli?
phd[['articles', 'kids']].groupby('kids').mean()
| articles | |
|---|---|
| kids | |
| 0 | 1.721202 |
| 1 | 1.758974 |
| 2 | 1.542857 |
| 3 | 0.812500 |
- Rappresentare la distribuzione di ‘articles’ condizionatamente al numero di figli tramite dei boxplot.
import seaborn as sns
sns.catplot(data = phd, x = 'kids', y = 'articles', kind = 'box')
<seaborn.axisgrid.FacetGrid at 0x13f780f80>
- Rappresentare l’istogramma di ‘prestige’, suddividendo il plot in due facet sulla base di ‘gender’
sns.displot(data = phd, x = 'prestige', col = 'gender') #metodo 1
<seaborn.axisgrid.FacetGrid at 0x169f316a0>
g = sns.FacetGrid(phd, col = 'gender') #metodo 2
g.map(sns.histplot, 'prestige')
<seaborn.axisgrid.FacetGrid at 0x16a9f4890>
- Stimare un modello di regressione lineare tra ’prestige’ (variabile esplicativa) e ‘mentor’ (variabile
risposta), includendo l’intercetta. Si rappresenti la retta di regressione.
import statsmodels.api as sm
lm = sm.OLS(phd.mentor, sm.add_constant(phd.prestige))
res = lm.fit()
print(res.summary())
sns.lmplot(x = 'prestige', y = 'mentor', data = phd, ci = None)
OLS Regression Results
==============================================================================
Dep. Variable: mentor R-squared: 0.068
Model: OLS Adj. R-squared: 0.067
Method: Least Squares F-statistic: 66.42
Date: Tue, 21 Jan 2025 Prob (F-statistic): 1.19e-15
Time: 13:59:05 Log-Likelihood: -3324.1
No. Observations: 915 AIC: 6652.
Df Residuals: 913 BIC: 6662.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.9807 1.002 0.978 0.328 -0.986 2.948
prestige 2.5093 0.308 8.150 0.000 1.905 3.114
==============================================================================
Omnibus: 515.309 Durbin-Watson: 1.763
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4360.948
Skew: 2.472 Prob(JB): 0.00
Kurtosis: 12.484 Cond. No. 11.7
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<seaborn.axisgrid.FacetGrid at 0x16ac7d2b0>
Scrivere una funzione universale che prenda in input una lista di numeri reali e restituisca i valori ad essi corrispondenti secondo la funzione f (x) = x3 − x + 1. Creare una lista di numeri x e i corrispondenti valori y restituiti dalla funzione e rappresentarli come una linea continua, utilizzando matplotlib.
import numpy as np
import matplotlib.pyplot as plt
def f(x):
return x**3-x+1
x = list(range(-10, 10))
y = np.frompyfunc(f, 1, 1)
plt.plot(x, y(x))
plt.title('grafico della funzione y = x^3-x+1')
plt.xlabel('asse delle x')
plt.ylabel('asse delle y')
print(f'valori in x: {x}')
print(f'valori in y: {y(x)}')
valori in x: [-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] valori in y: [-989 -719 -503 -335 -209 -119 -59 -23 -5 1 1 1 7 25 61 121 211 337 505 721]
import os
os.getcwd()
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
import pandas as pd
df = pd.read_csv('homes.csv')
df
| yearbuilt | finsqft | cooling | bedroom | fullbath | halfbath | lotsize | totalvalue | hsdistrict | age | condition | fp | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1754 | 1254 | No Central Air | 1 | 1 | 0 | 4.933 | 124300 | Western Albemarle | 265 | Substandard | 0 |
| 1 | 1968 | 1192 | No Central Air | 3 | 1 | 0 | 1.087 | 109200 | Monticello | 51 | Substandard | 0 |
| 2 | 1754 | 881 | No Central Air | 2 | 1 | 0 | 195.930 | 141600 | Albemarle | 265 | Substandard | 0 |
| 3 | 1934 | 480 | No Central Air | 0 | 0 | 0 | 10.000 | 69200 | Western Albemarle | 85 | Substandard | 0 |
| 4 | 1963 | 720 | No Central Air | 2 | 1 | 0 | 1.000 | 139700 | Western Albemarle | 56 | Substandard | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3015 | 1965 | 1140 | No Central Air | 3 | 1 | 0 | 0.490 | 222600 | Monticello | 54 | Excellent | 0 |
| 3016 | 1995 | 6963 | Central Air | 4 | 5 | 1 | 8.820 | 2746700 | Western Albemarle | 24 | Excellent | 1 |
| 3017 | 1885 | 1744 | Central Air | 3 | 2 | 0 | 4.160 | 333000 | Monticello | 134 | Excellent | 1 |
| 3018 | 1988 | 1638 | Central Air | 4 | 3 | 0 | 3.815 | 257900 | Albemarle | 31 | Excellent | 0 |
| 3019 | 1955 | 1659 | Central Air | 2 | 2 | 0 | 0.523 | 286300 | Albemarle | 64 | Excellent | 0 |
3020 rows × 12 columns
df.dtypes
yearbuilt int64 finsqft int64 cooling object bedroom int64 fullbath int64 halfbath int64 lotsize float64 totalvalue int64 hsdistrict object age int64 condition object fp int64 dtype: object
- Selezionare dalla settima alla quindicesima riga del dataset e le colonne ‘bedroom’ e ‘lotsize’ utiliz-
zando sia loc che iloc.
iloc = df.iloc[6:15, [3, 6]]
iloc
| bedroom | lotsize | |
|---|---|---|
| 6 | 2 | 4.017 |
| 7 | 3 | 0.950 |
| 8 | 2 | 0.750 |
| 9 | 0 | 13.525 |
| 10 | 2 | 0.910 |
| 11 | 1 | 0.270 |
| 12 | 2 | 1.500 |
| 13 | 3 | 3.003 |
| 14 | 3 | 3.740 |
loc = df.loc[6:14, ['bedroom', 'lotsize']]
loc
| bedroom | lotsize | |
|---|---|---|
| 6 | 2 | 4.017 |
| 7 | 3 | 0.950 |
| 8 | 2 | 0.750 |
| 9 | 0 | 13.525 |
| 10 | 2 | 0.910 |
| 11 | 1 | 0.270 |
| 12 | 2 | 1.500 |
| 13 | 3 | 3.003 |
| 14 | 3 | 3.740 |
- Selezionare le osservazioni il cui anno di costruzione `e uguale o successivo al 1960 e le cui dimensioni
siano inferiori a 800 piedi quadrati.
df[(df['yearbuilt'] >= 1960) & (df['finsqft'] < 800)]
| yearbuilt | finsqft | cooling | bedroom | fullbath | halfbath | lotsize | totalvalue | hsdistrict | age | condition | fp | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 1963 | 720 | No Central Air | 2 | 1 | 0 | 1.000 | 139700 | Western Albemarle | 56 | Substandard | 0 |
| 20 | 1962 | 720 | Central Air | 3 | 1 | 0 | 0.500 | 61600 | Western Albemarle | 57 | Poor | 0 |
| 28 | 1968 | 672 | No Central Air | 2 | 1 | 0 | 1.500 | 99600 | Monticello | 51 | Poor | 0 |
| 265 | 1961 | 768 | Central Air | 2 | 1 | 0 | 1.464 | 148000 | Monticello | 58 | Average | 0 |
| 365 | 1979 | 748 | Central Air | 3 | 2 | 0 | 13.090 | 248900 | Monticello | 40 | Average | 0 |
| 430 | 1970 | 768 | No Central Air | 2 | 1 | 0 | 15.250 | 207700 | Monticello | 49 | Average | 0 |
| 432 | 1960 | 720 | Central Air | 3 | 2 | 0 | 2.971 | 185000 | Albemarle | 59 | Average | 0 |
| 475 | 1982 | 768 | Central Air | 2 | 1 | 0 | 2.604 | 171300 | Western Albemarle | 37 | Average | 0 |
| 487 | 1993 | 784 | Central Air | 2 | 1 | 0 | 11.500 | 207200 | Western Albemarle | 26 | Average | 0 |
| 673 | 1982 | 406 | No Central Air | 0 | 1 | 0 | 2.000 | 122700 | Monticello | 37 | Average | 1 |
| 849 | 1970 | 384 | No Central Air | 0 | 0 | 0 | 7.190 | 153000 | Western Albemarle | 49 | Average | 1 |
| 937 | 1983 | 796 | Central Air | 1 | 2 | 0 | 2.350 | 262700 | Western Albemarle | 36 | Average | 0 |
| 1007 | 1985 | 768 | No Central Air | 2 | 1 | 0 | 2.200 | 126100 | Monticello | 34 | Average | 0 |
| 1035 | 1970 | 480 | No Central Air | 2 | 1 | 0 | 1.040 | 66200 | Monticello | 49 | Average | 0 |
| 1533 | 2006 | 768 | Central Air | 2 | 1 | 0 | 0.350 | 157300 | Western Albemarle | 13 | Average | 0 |
| 1535 | 1994 | 640 | No Central Air | 2 | 1 | 0 | 1.882 | 90100 | Monticello | 25 | Average | 0 |
| 1619 | 1960 | 702 | No Central Air | 1 | 1 | 0 | 5.000 | 279300 | Western Albemarle | 59 | Average | 1 |
| 1709 | 2016 | 736 | Central Air | 1 | 1 | 0 | 0.029 | 203300 | Western Albemarle | 3 | Average | 0 |
| 2072 | 1966 | 640 | No Central Air | 2 | 1 | 0 | 1.650 | 152000 | Albemarle | 53 | Average | 0 |
| 2595 | 2006 | 786 | No Central Air | 3 | 2 | 0 | 5.000 | 172700 | Western Albemarle | 13 | Good | 0 |
- Quante sono le case in condizione eccellente? Rappresentare un grafico a barre con il numero di
immobili per condizione.
df['condition'].value_counts()
condition Average 2304 Good 507 Fair 133 Poor 32 Excellent 29 Substandard 15 Name: count, dtype: int64
import seaborn as sns
sns.catplot(data = df, x = 'condition', kind = 'count')
<seaborn.axisgrid.FacetGrid at 0x16a862390>
- Calcolare la dimensione media degli immobili raggruppando in base al numero di camere da letto.
df[['bedroom', 'finsqft']].groupby('bedroom').mean()
| finsqft | |
|---|---|
| bedroom | |
| 0 | 917.500000 |
| 1 | 1284.333333 |
| 2 | 1339.210356 |
| 3 | 1729.750000 |
| 4 | 2511.640914 |
| 5 | 3107.096774 |
| 6 | 3963.289474 |
| 7 | 4432.888889 |
| 8 | 6736.000000 |
- Inserire nel dataset una nuova variabile ‘price100’ ottenuta dividendo ‘totalvalue’ per 100 000. Rap-
presentare la distribuzione della nuova variabile condizionatamente al distretto tramite dei boxplot.
df['price100'] = df['totalvalue']/100000
df
| yearbuilt | finsqft | cooling | bedroom | fullbath | halfbath | lotsize | totalvalue | hsdistrict | age | condition | fp | price100 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1754 | 1254 | No Central Air | 1 | 1 | 0 | 4.933 | 124300 | Western Albemarle | 265 | Substandard | 0 | 1.243 |
| 1 | 1968 | 1192 | No Central Air | 3 | 1 | 0 | 1.087 | 109200 | Monticello | 51 | Substandard | 0 | 1.092 |
| 2 | 1754 | 881 | No Central Air | 2 | 1 | 0 | 195.930 | 141600 | Albemarle | 265 | Substandard | 0 | 1.416 |
| 3 | 1934 | 480 | No Central Air | 0 | 0 | 0 | 10.000 | 69200 | Western Albemarle | 85 | Substandard | 0 | 0.692 |
| 4 | 1963 | 720 | No Central Air | 2 | 1 | 0 | 1.000 | 139700 | Western Albemarle | 56 | Substandard | 0 | 1.397 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3015 | 1965 | 1140 | No Central Air | 3 | 1 | 0 | 0.490 | 222600 | Monticello | 54 | Excellent | 0 | 2.226 |
| 3016 | 1995 | 6963 | Central Air | 4 | 5 | 1 | 8.820 | 2746700 | Western Albemarle | 24 | Excellent | 1 | 27.467 |
| 3017 | 1885 | 1744 | Central Air | 3 | 2 | 0 | 4.160 | 333000 | Monticello | 134 | Excellent | 1 | 3.330 |
| 3018 | 1988 | 1638 | Central Air | 4 | 3 | 0 | 3.815 | 257900 | Albemarle | 31 | Excellent | 0 | 2.579 |
| 3019 | 1955 | 1659 | Central Air | 2 | 2 | 0 | 0.523 | 286300 | Albemarle | 64 | Excellent | 0 | 2.863 |
3020 rows × 13 columns
sns.catplot(data = df, x = 'hsdistrict', y = 'price100', kind = 'box')
<seaborn.axisgrid.FacetGrid at 0x304f7ed20>
- Rappresentare le relazioni tra le variabili ‘finsqft’, ‘totalvalue’, ‘lotsize’, ‘age’ utilizzando pairplot,
colorando i punti in base al sistema di climatizzazione.
sns.pairplot(data = df, vars = ['finsqft', 'totalvalue', 'lotsize', 'age'], hue = 'cooling')
<seaborn.axisgrid.PairGrid at 0x3027f20c0>
- Stimare un modello di regressione lineare tra ‘finsqft’ (variabile esplicativa) e ‘totalvalue’ (variabile
risposta), includendo l’intercetta. I coefficienti ottenuti sono coerenti con lo scatterplot ottenuto nel punto precedente?
import statsmodels.api as sm
lm = sm.OLS(df.totalvalue, sm.add_constant(df.finsqft))
res = lm.fit()
print(res.summary())
OLS Regression Results
==============================================================================
Dep. Variable: totalvalue R-squared: 0.566
Model: OLS Adj. R-squared: 0.566
Method: Least Squares F-statistic: 3937.
Date: Tue, 21 Jan 2025 Prob (F-statistic): 0.00
Time: 14:53:08 Log-Likelihood: -41719.
No. Observations: 3020 AIC: 8.344e+04
Df Residuals: 3018 BIC: 8.345e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -1.662e+05 1.04e+04 -15.935 0.000 -1.87e+05 -1.46e+05
finsqft 288.2442 4.594 62.748 0.000 279.237 297.251
==============================================================================
Omnibus: 4319.800 Durbin-Watson: 1.995
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2511875.784
Skew: 8.117 Prob(JB): 0.00
Kurtosis: 143.351 Cond. No. 5.38e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
il coefficiente angolare è positivo, coerente con lo scatterplot precedente
sns.lmplot(x = 'finsqft', y = 'totalvalue', data = df, ci = None)
<seaborn.axisgrid.FacetGrid at 0x30565e870>
Scrivere una funzione per convertire una lunghezza espressa in iarde in una lunghezza espressa in metri
e viceversa. La funzione deve prendere in input tre argomenti, ognuno definito tramite keyword: le
prime due keyword sono “Misura iniziale” e “Misura finale” e ciascuna puo assumere unicamente i valori “iarde” e “metri”, mentre la terza keyworde “Lunghezza” e deve essere un numero reale corrispondente
alla lunghezza da convertire. Quindi, ad esempio, se “Misura iniziale” = “iarde” e “Misura finale” =
“metri” allora si dovra convertire la lunghezza specificata da iarde a metri. Se le prime due keyword sono diverse da “iarde” o “metri” e/o se la lunghezza inseritae negativa si deve restituire un errore. Si tenga
conto che 1 iarda = 0.9144 metri.
def conv(*, misura_iniziale, misura_finale, lunghezza):
if misura_iniziale.lower() not in ['metri', 'iarde'] or misura_finale.lower() not in ['metri', 'iarde'] or lunghezza < 0:
return 'errore'
if misura_iniziale.lower() == 'metri' and misura_finale.lower() == 'iarde':
return lunghezza/0.9144
elif misura_iniziale.lower() == 'iarde' and misura_finale.lower() == 'metri':
return lunghezza*0.9144
else:
return 'non è necessaria alcuna conversione'
print(conv(misura_iniziale = 'iarde', misura_finale = 'metri', lunghezza = 12))
print(conv(misura_iniziale = 'metri', misura_finale = 'iarde', lunghezza = 12))
10.9728 13.123359580052494
import os
os.getcwd()
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
os.chdir('/Users/ludovicavargiu/Downloads')
diabete = pd.read_csv('diabete.csv')
diabete
| chol | stab.glu | hdl | ratio | glyhb | age | gender | height | weight | bp.1s | bp.1d | waist | hip | time.ppn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 203 | 82 | 56 | 3.6 | 0 | 46 | female | 62 | 121 | 118 | 59 | 29 | 38 | 720 |
| 1 | 165 | 97 | 24 | 6.9 | 0 | 29 | female | 64 | 218 | 112 | 68 | 46 | 48 | 360 |
| 2 | 228 | 92 | 37 | 6.2 | 0 | 58 | female | 61 | 256 | 190 | 92 | 49 | 57 | 180 |
| 3 | 78 | 93 | 12 | 6.5 | 0 | 67 | male | 67 | 119 | 110 | 50 | 33 | 38 | 480 |
| 4 | 249 | 90 | 28 | 8.9 | 1 | 64 | male | 68 | 183 | 138 | 80 | 44 | 41 | 300 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 361 | 301 | 90 | 118 | 2.6 | 0 | 89 | female | 61 | 115 | 218 | 90 | 31 | 41 | 210 |
| 362 | 296 | 369 | 46 | 6.4 | 1 | 53 | male | 69 | 173 | 138 | 94 | 35 | 39 | 210 |
| 363 | 284 | 89 | 54 | 5.3 | 0 | 51 | female | 63 | 154 | 140 | 100 | 32 | 43 | 180 |
| 364 | 194 | 269 | 38 | 5.1 | 1 | 29 | female | 69 | 167 | 120 | 70 | 33 | 40 | 20 |
| 365 | 199 | 76 | 52 | 3.8 | 0 | 41 | female | 63 | 197 | 120 | 78 | 41 | 48 | 255 |
366 rows × 14 columns
diabete.dtypes
chol int64 stab.glu int64 hdl int64 ratio float64 glyhb int64 age int64 gender object height int64 weight int64 bp.1s int64 bp.1d int64 waist int64 hip int64 time.ppn int64 dtype: object
- Selezionare dalla decima alla venticinquesima riga del dataset e le colonne ‘hdl’ e ‘height’ utilizzando
sia loc che iloc.
iloc = diabete.iloc[9:25, [2, 7]]
iloc
| hdl | height | |
|---|---|---|
| 9 | 54 | 65 |
| 10 | 34 | 58 |
| 11 | 36 | 60 |
| 12 | 30 | 69 |
| 13 | 47 | 65 |
| 14 | 38 | 65 |
| 15 | 64 | 67 |
| 16 | 36 | 64 |
| 17 | 41 | 65 |
| 18 | 50 | 67 |
| 19 | 76 | 67 |
| 20 | 43 | 69 |
| 21 | 41 | 62 |
| 22 | 45 | 61 |
| 23 | 92 | 72 |
| 24 | 30 | 68 |
loc = diabete.loc[9:24, ['hdl', 'height']]
loc
| hdl | height | |
|---|---|---|
| 9 | 54 | 65 |
| 10 | 34 | 58 |
| 11 | 36 | 60 |
| 12 | 30 | 69 |
| 13 | 47 | 65 |
| 14 | 38 | 65 |
| 15 | 64 | 67 |
| 16 | 36 | 64 |
| 17 | 41 | 65 |
| 18 | 50 | 67 |
| 19 | 76 | 67 |
| 20 | 43 | 69 |
| 21 | 41 | 62 |
| 22 | 45 | 61 |
| 23 | 92 | 72 |
| 24 | 30 | 68 |
- Selezionare i soggetti di sesso femminile aventi più di 55 anni.
diabete[(diabete['gender'] == 'female') & (diabete['age'] > 55)]
| chol | stab.glu | hdl | ratio | glyhb | age | gender | height | weight | bp.1s | bp.1d | waist | hip | time.ppn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 228 | 92 | 37 | 6.2 | 0 | 58 | female | 61 | 256 | 190 | 92 | 49 | 57 | 180 |
| 9 | 242 | 82 | 54 | 4.5 | 0 | 60 | female | 65 | 156 | 130 | 90 | 39 | 45 | 300 |
| 17 | 196 | 206 | 41 | 4.8 | 1 | 62 | female | 65 | 196 | 178 | 90 | 46 | 51 | 540 |
| 21 | 281 | 92 | 41 | 6.9 | 0 | 66 | female | 62 | 185 | 158 | 88 | 48 | 44 | 285 |
| 30 | 182 | 85 | 37 | 4.9 | 0 | 61 | female | 69 | 174 | 176 | 86 | 49 | 43 | 330 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 338 | 194 | 95 | 36 | 5.4 | 0 | 63 | female | 58 | 210 | 140 | 100 | 44 | 53 | 240 |
| 346 | 162 | 90 | 46 | 3.5 | 0 | 60 | female | 63 | 121 | 110 | 64 | 32 | 34 | 300 |
| 353 | 279 | 270 | 40 | 7.0 | 1 | 60 | female | 68 | 224 | 174 | 90 | 48 | 50 | 180 |
| 357 | 221 | 126 | 48 | 4.6 | 0 | 59 | female | 62 | 177 | 130 | 78 | 39 | 45 | 60 |
| 361 | 301 | 90 | 118 | 2.6 | 0 | 89 | female | 61 | 115 | 218 | 90 | 31 | 41 | 210 |
61 rows × 14 columns
- Dopo aver selezionato i soggetti aventi una misura di ‘hip’ inferiore a 40, si calcoli la media di
‘stab.glu’ raggruppando per ‘hip’.
reduc = diabete.loc[diabete['hip'] < 40, ['hip', 'stab.glu']]
reduc.groupby('hip').mean()
| stab.glu | |
|---|---|
| hip | |
| 30 | 106.000000 |
| 32 | 77.000000 |
| 33 | 90.571429 |
| 34 | 81.800000 |
| 35 | 77.800000 |
| 36 | 79.333333 |
| 37 | 96.307692 |
| 38 | 100.961538 |
| 39 | 104.441176 |
- Determinare il numero di uomini e donne, e rappresentare gli istogrammi dell’altezza in due facet,
suddividendo per genere.
diabete['gender'].value_counts()
gender female 214 male 152 Name: count, dtype: int64
sns.displot(data = diabete, x = 'height', col = 'gender')
<seaborn.axisgrid.FacetGrid at 0x305ba5e50>
- Rappresentare la distribuzione del colesterolo tramite boxplot, condizionatamente a ‘gender’ e ‘glyhb’
sns.catplot(data = diabete, x = 'gender', y = 'chol', hue = 'glyhb', kind = 'box')
<seaborn.axisgrid.FacetGrid at 0x305b81dc0>
- Rappresentare le relazioni tra le variabili ‘ratio’, ‘weight’, ‘waist’ e ‘hip’ utilizzando pairplot, col-
orando i punti in base a ‘glyhb’.
sns.pairplot(data = diabete, vars = ['ratio', 'weight', 'waist', 'hip'], hue = 'glyhb')
<seaborn.axisgrid.PairGrid at 0x305d57410>
- Stimare un modello di regressione lineare tra ‘weight’ (variabile esplicativa) e ‘waist’ (variabile
risposta), includendo l’intercetta. I coefficienti ottenuti sono coerenti con lo scatterplot ottenuto nel punto precedente?
lm = sm.OLS(diabete.waist, sm.add_constant(diabete.weight))
res = lm.fit()
print(res.summary())
sns.lmplot(x = 'weight', y = 'waist', data = diabete, ci = None)
OLS Regression Results
==============================================================================
Dep. Variable: waist R-squared: 0.726
Model: OLS Adj. R-squared: 0.725
Method: Least Squares F-statistic: 963.4
Date: Tue, 21 Jan 2025 Prob (F-statistic): 2.67e-104
Time: 15:57:09 Log-Likelihood: -926.02
No. Observations: 366 AIC: 1856.
Df Residuals: 364 BIC: 1864.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 16.2641 0.716 22.714 0.000 14.856 17.672
weight 0.1217 0.004 31.038 0.000 0.114 0.129
==============================================================================
Omnibus: 18.991 Durbin-Watson: 1.728
Prob(Omnibus): 0.000 Jarque-Bera (JB): 21.055
Skew: 0.514 Prob(JB): 2.68e-05
Kurtosis: 3.570 Cond. No. 822.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<seaborn.axisgrid.FacetGrid at 0x306443fb0>
il coefficiente angolare è positivo, coerente con lo scatterplot precedente
cond = diabete.loc[diabete['hip'] < 40, ['hip', 'stab.glu']]
cond.groupby('hip').mean()
print(cond)
hip stab.glu 0 38 82 3 38 93 15 39 112 16 34 81 22 38 66 .. ... ... 347 38 267 354 38 81 358 39 81 360 39 85 362 39 369 [103 rows x 2 columns]
Giulio Cesare usava un cifrario per inviare messaggi crittografati in modo che nessuno potesse decifrarli.
La regola era associare ad ogni lettera dell’alfabeto la lettera che si trova tre posizioni successive. Ad
esempio A diventa D, B diventa E, e cosı via. Scrivere una funzione che prenda in input una stringa e la converta usando il cifrario di Cesare, tenendo conto che l’alfabeto italiano estesoe composto da 26 lettere
e che W corrisponde a Z, e dopo si ricomincia, quindi X corrisponde ad A, Y a B e Z a C. Ad esempio la
stringa ‘ciao’ diventerebbe ‘fldr’. Non si consideri il caso di stringhe in cui ci sono numeri o altri simboli.
def crittografia(stringa):
alfabeto = 'abcdefghijklmnopqrstuvwxyz'
res = ''
for char in stringa:
res += alfabeto[(alfabeto.index(char)+3)%26]
return res
print(crittografia('ciao'))
print(crittografia('yoga'))
fldr brjd
df = pd.read_csv('diamonds.csv')
df
| carat | cut | color | clarity | depth | table | price | x | y | z | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.19 | Premium | G | VS1 | 61.2 | 55.0 | 7797 | 6.86 | 6.81 | 4.18 |
| 1 | 0.36 | Very Good | D | VS2 | 62.4 | 58.0 | 780 | 4.48 | 4.53 | 2.81 |
| 2 | 0.30 | Ideal | E | VVS2 | 61.8 | 56.0 | 789 | 4.31 | 4.33 | 2.67 |
| 3 | 0.38 | Ideal | E | VVS2 | 61.6 | 55.0 | 1176 | 4.72 | 4.67 | 2.89 |
| 4 | 0.33 | Ideal | E | VVS1 | 61.9 | 54.0 | 945 | 4.45 | 4.47 | 2.76 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 995 | 0.91 | Ideal | G | SI1 | 60.8 | 58.0 | 4922 | 6.21 | 6.29 | 3.80 |
| 996 | 1.24 | Good | G | VS2 | 61.3 | 57.0 | 8672 | 6.78 | 6.88 | 4.19 |
| 997 | 1.00 | Good | I | VVS2 | 57.4 | 59.0 | 4032 | 6.61 | 6.53 | 3.77 |
| 998 | 1.53 | Premium | E | SI1 | 60.8 | 60.0 | 12499 | 7.51 | 7.43 | 4.54 |
| 999 | 0.74 | Ideal | G | SI1 | 61.3 | 57.0 | 3130 | 5.81 | 5.83 | 3.57 |
1000 rows × 10 columns
df.dtypes
carat float64 cut object color object clarity object depth float64 table float64 price int64 x float64 y float64 z float64 dtype: object
iloc = df.iloc[23:30, [1, 3]]
iloc
| cut | clarity | |
|---|---|---|
| 23 | Very Good | VS2 |
| 24 | Premium | VS2 |
| 25 | Premium | VVS2 |
| 26 | Ideal | VVS1 |
| 27 | Premium | VS2 |
| 28 | Ideal | VS1 |
| 29 | Ideal | VS2 |
loc = df.loc[23:29, ['cut', 'depth']]
loc
| cut | depth | |
|---|---|---|
| 23 | Very Good | 59.1 |
| 24 | Premium | 62.2 |
| 25 | Premium | 60.5 |
| 26 | Ideal | 61.1 |
| 27 | Premium | 58.8 |
| 28 | Ideal | 62.5 |
| 29 | Ideal | 62.9 |
df[(df['clarity'] == 'VS1') & (df['carat'] > 1)]
| carat | cut | color | clarity | depth | table | price | x | y | z | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.19 | Premium | G | VS1 | 61.2 | 55.0 | 7797 | 6.86 | 6.81 | 4.18 |
| 7 | 1.01 | Fair | I | VS1 | 64.9 | 58.0 | 4263 | 6.17 | 6.22 | 4.02 |
| 14 | 1.51 | Very Good | E | VS1 | 59.5 | 59.0 | 16129 | 7.48 | 7.54 | 4.47 |
| 107 | 1.09 | Ideal | G | VS1 | 60.3 | 57.0 | 8305 | 6.68 | 6.73 | 4.04 |
| 113 | 1.04 | Ideal | G | VS1 | 61.5 | 55.0 | 6831 | 6.54 | 6.55 | 4.02 |
| 118 | 1.01 | Ideal | G | VS1 | 62.4 | 56.0 | 6066 | 6.37 | 6.42 | 3.99 |
| 136 | 1.21 | Premium | H | VS1 | 62.3 | 58.0 | 7094 | 6.77 | 6.84 | 4.24 |
| 166 | 1.07 | Premium | G | VS1 | 61.4 | 56.0 | 6076 | 6.65 | 6.61 | 4.07 |
| 190 | 1.35 | Ideal | G | VS1 | 61.5 | 56.0 | 10378 | 7.15 | 7.12 | 4.39 |
| 191 | 1.55 | Premium | H | VS1 | 62.6 | 58.0 | 11562 | 7.40 | 7.34 | 4.61 |
| 202 | 1.18 | Ideal | I | VS1 | 62.2 | 55.0 | 6272 | 6.80 | 6.77 | 4.22 |
| 210 | 1.20 | Good | G | VS1 | 63.6 | 58.0 | 8387 | 6.59 | 6.56 | 4.18 |
| 220 | 1.03 | Ideal | D | VS1 | 61.5 | 57.0 | 8742 | 6.48 | 6.52 | 4.00 |
| 234 | 1.04 | Ideal | I | VS1 | 62.9 | 43.0 | 4997 | 6.45 | 6.41 | 4.04 |
| 257 | 1.54 | Ideal | F | VS1 | 60.3 | 57.0 | 18416 | 7.49 | 7.56 | 4.54 |
| 281 | 1.34 | Very Good | J | VS1 | 61.7 | 59.0 | 6237 | 7.03 | 7.14 | 4.37 |
| 320 | 1.26 | Ideal | H | VS1 | 61.5 | 59.0 | 7845 | 6.94 | 6.91 | 4.26 |
| 339 | 1.01 | Premium | D | VS1 | 62.4 | 58.0 | 8265 | 6.38 | 6.41 | 3.99 |
| 347 | 1.14 | Good | I | VS1 | 63.3 | 56.0 | 5056 | 6.60 | 6.68 | 4.20 |
| 403 | 1.27 | Premium | F | VS1 | 60.3 | 58.0 | 10028 | 7.06 | 7.04 | 4.25 |
| 487 | 1.07 | Very Good | D | VS1 | 59.9 | 55.0 | 9681 | 6.69 | 6.71 | 4.01 |
| 490 | 1.03 | Premium | H | VS1 | 62.0 | 59.0 | 5523 | 6.45 | 6.48 | 4.01 |
| 508 | 1.51 | Premium | G | VS1 | 59.5 | 59.0 | 14156 | 7.45 | 7.41 | 4.42 |
| 510 | 1.32 | Ideal | G | VS1 | 62.4 | 53.0 | 10631 | 7.03 | 7.08 | 4.40 |
| 517 | 1.05 | Ideal | H | VS1 | 61.8 | 55.0 | 6833 | 6.54 | 6.57 | 4.05 |
| 529 | 1.60 | Premium | J | VS1 | 61.3 | 59.0 | 9032 | 7.52 | 7.49 | 4.60 |
| 550 | 1.23 | Very Good | F | VS1 | 59.3 | 59.0 | 10609 | 6.98 | 7.01 | 4.15 |
| 560 | 1.51 | Fair | G | VS1 | 64.9 | 55.0 | 11739 | 7.25 | 7.14 | 4.67 |
| 578 | 1.09 | Ideal | D | VS1 | 62.3 | 56.0 | 9650 | 6.63 | 6.59 | 4.12 |
| 597 | 1.26 | Premium | F | VS1 | 62.0 | 58.0 | 10669 | 6.95 | 6.88 | 4.29 |
| 605 | 1.01 | Ideal | G | VS1 | 61.9 | 54.0 | 7179 | 6.44 | 6.48 | 4.00 |
| 661 | 1.01 | Ideal | G | VS1 | 62.4 | 56.0 | 6672 | 6.39 | 6.44 | 4.00 |
| 712 | 1.20 | Premium | E | VS1 | 60.7 | 57.0 | 10053 | 6.89 | 6.81 | 4.16 |
| 809 | 1.01 | Very Good | G | VS1 | 63.6 | 57.0 | 6905 | 6.30 | 6.35 | 4.02 |
| 811 | 1.09 | Premium | J | VS1 | 59.3 | 57.0 | 4303 | 6.79 | 6.74 | 4.01 |
| 830 | 1.08 | Premium | G | VS1 | 62.0 | 60.0 | 6689 | 6.55 | 6.51 | 4.05 |
| 839 | 1.87 | Ideal | H | VS1 | 59.7 | 60.0 | 17761 | 7.98 | 8.04 | 4.78 |
| 895 | 1.26 | Ideal | G | VS1 | 62.3 | 57.0 | 6604 | 6.93 | 6.87 | 4.30 |
| 913 | 1.19 | Ideal | H | VS1 | 62.1 | 54.0 | 7181 | 6.80 | 6.83 | 4.23 |
| 927 | 1.02 | Good | H | VS1 | 59.8 | 63.0 | 5598 | 6.54 | 6.61 | 3.93 |
df['color'].value_counts()
color G 204 E 202 F 186 H 158 D 106 I 102 J 42 Name: count, dtype: int64
g = sns.FacetGrid(data = df, col = 'color', col_wrap = 3)
g.map(sns.histplot, 'depth')
<seaborn.axisgrid.FacetGrid at 0x305e0fa70>
sns.catplot(data = df, x = 'cut', y = 'price', kind = 'box')
<seaborn.axisgrid.FacetGrid at 0x30696db20>
sns.pairplot(data = df)
<seaborn.axisgrid.PairGrid at 0x306b1a5a0>
lm = sm.OLS(df.price, sm.add_constant(df.carat))
res = lm.fit()
print(res.summary())
sns.lmplot(x = 'carat', y = 'price', data = df, ci = None)
OLS Regression Results
==============================================================================
Dep. Variable: price R-squared: 0.853
Model: OLS Adj. R-squared: 0.853
Method: Least Squares F-statistic: 5795.
Date: Tue, 21 Jan 2025 Prob (F-statistic): 0.00
Time: 16:58:31 Log-Likelihood: -8755.3
No. Observations: 1000 AIC: 1.751e+04
Df Residuals: 998 BIC: 1.752e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -2194.9975 94.871 -23.137 0.000 -2381.168 -2008.827
carat 7662.8765 100.659 76.127 0.000 7465.350 7860.403
==============================================================================
Omnibus: 268.582 Durbin-Watson: 1.977
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1461.927
Skew: 1.120 Prob(JB): 0.00
Kurtosis: 8.484 Cond. No. 3.64
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<seaborn.axisgrid.FacetGrid at 0x308e8b530>
il coefficiente angolare è positivo, coerente con lo scatterplot del punto precedente
def dado(n):
comb = np.zeros((6, 6), dtype = 'int')
risultati = np.zeros((2, 11), dtype = 'int')
for _ in range(n):
dado1 = random.randint(1, 6)
dado2 = random.randint(1, 6)
comb[dado1 - 1, dado2 - 1] += 1
somma = dado1 + dado2
risultati[0, somma - 2] = somma
risultati[1, somma - 2] += 1
return comb, risultati
print(dado(20))
os.getcwd()
'/Users/ludovicavargiu/Downloads'
os.chdir('/Users/ludovicavargiu/Desktop/Laboratorio Python')
data = pd.read_csv('myopia_study.csv')
data
| MYOPIC | AGE | GENDER | SPHEQ | AL | ACD | LT | VCD | SPORTHR | READHR | COMPHR | STUDYHR | TVHR | MOMMY | DADMY | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 6 | 1 | -0.052 | 21.89 | 3.690 | 3.498 | 14.70 | 45 | 8 | 0 | 0 | 10 | 1 | 1 |
| 1 | 0 | 6 | 1 | 0.608 | 22.38 | 3.702 | 3.392 | 15.29 | 4 | 0 | 1 | 1 | 7 | 1 | 1 |
| 2 | 0 | 6 | 1 | 1.179 | 22.49 | 3.462 | 3.514 | 15.52 | 14 | 0 | 2 | 0 | 10 | 0 | 0 |
| 3 | 1 | 6 | 1 | 0.525 | 22.20 | 3.862 | 3.612 | 14.73 | 18 | 11 | 0 | 0 | 4 | 0 | 1 |
| 4 | 0 | 5 | 0 | 0.697 | 23.29 | 3.676 | 3.454 | 16.16 | 14 | 0 | 0 | 0 | 4 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 613 | 1 | 6 | 0 | 0.678 | 22.40 | 3.663 | 3.803 | 14.93 | 2 | 0 | 7 | 3 | 14 | 1 | 0 |
| 614 | 0 | 6 | 1 | 0.665 | 22.50 | 3.570 | 3.378 | 15.56 | 6 | 0 | 1 | 0 | 8 | 1 | 1 |
| 615 | 0 | 6 | 0 | 1.834 | 22.94 | 3.624 | 3.424 | 15.89 | 8 | 0 | 0 | 0 | 4 | 1 | 1 |
| 616 | 0 | 6 | 1 | 0.665 | 21.92 | 3.688 | 3.598 | 14.64 | 12 | 2 | 1 | 0 | 15 | 0 | 0 |
| 617 | 0 | 6 | 0 | 0.802 | 22.26 | 3.530 | 3.484 | 15.25 | 25 | 0 | 2 | 0 | 10 | 1 | 1 |
618 rows × 15 columns
data.dtypes
MYOPIC int64 AGE int64 GENDER int64 SPHEQ float64 AL float64 ACD float64 LT float64 VCD float64 SPORTHR int64 READHR int64 COMPHR int64 STUDYHR int64 TVHR int64 MOMMY int64 DADMY int64 dtype: object
iloc = data.iloc[2: 8, [3, 7]]
iloc
| SPHEQ | VCD | |
|---|---|---|
| 2 | 1.179 | 15.52 |
| 3 | 0.525 | 14.73 |
| 4 | 0.697 | 16.16 |
| 5 | 1.744 | 15.36 |
| 6 | 0.683 | 15.49 |
| 7 | 1.272 | 15.08 |
loc = data.loc[2:7, ['SPHEQ', 'VCD']]
loc
| SPHEQ | VCD | |
|---|---|---|
| 2 | 1.179 | 15.52 |
| 3 | 0.525 | 14.73 |
| 4 | 0.697 | 16.16 |
| 5 | 1.744 | 15.36 |
| 6 | 0.683 | 15.49 |
| 7 | 1.272 | 15.08 |
data[(data['AGE'] > 7) & (data['TVHR'] <= 10)]
| MYOPIC | AGE | GENDER | SPHEQ | AL | ACD | LT | VCD | SPORTHR | READHR | COMPHR | STUDYHR | TVHR | MOMMY | DADMY | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 71 | 0 | 9 | 0 | 0.118 | 24.24 | 3.678 | 3.533 | 17.03 | 7 | 0 | 0 | 4 | 4 | 1 | 1 |
| 72 | 0 | 8 | 1 | 0.672 | 22.99 | 3.798 | 3.822 | 15.37 | 26 | 9 | 1 | 10 | 7 | 0 | 1 |
| 112 | 0 | 8 | 0 | 0.588 | 22.59 | 3.492 | 3.322 | 15.77 | 9 | 2 | 1 | 2 | 4 | 1 | 1 |
| 147 | 0 | 9 | 1 | 1.368 | 23.75 | 4.030 | 2.960 | 16.76 | 9 | 2 | 1 | 5 | 7 | 0 | 0 |
| 165 | 0 | 8 | 1 | 1.012 | 22.41 | 4.048 | 3.460 | 14.90 | 16 | 0 | 0 | 3 | 9 | 0 | 0 |
| 185 | 0 | 8 | 1 | 0.542 | 21.77 | 3.650 | 3.554 | 14.57 | 21 | 5 | 3 | 5 | 9 | 0 | 1 |
| 227 | 0 | 8 | 1 | 1.031 | 22.33 | 3.660 | 3.364 | 15.30 | 4 | 3 | 0 | 3 | 6 | 1 | 1 |
| 230 | 0 | 8 | 1 | 0.625 | 23.24 | 3.930 | 3.294 | 16.01 | 14 | 5 | 7 | 3 | 9 | 1 | 0 |
| 233 | 0 | 8 | 0 | 0.112 | 23.69 | 3.997 | 3.500 | 16.20 | 9 | 4 | 4 | 3 | 7 | 0 | 0 |
| 240 | 0 | 8 | 0 | 0.442 | 22.29 | 3.417 | 3.547 | 15.33 | 14 | 4 | 1 | 3 | 9 | 0 | 1 |
| 251 | 0 | 8 | 0 | 0.882 | 22.53 | 3.382 | 3.514 | 15.64 | 16 | 2 | 3 | 3 | 9 | 0 | 1 |
| 253 | 0 | 8 | 1 | 0.048 | 22.39 | 3.483 | 3.640 | 15.26 | 22 | 8 | 0 | 3 | 8 | 0 | 0 |
| 260 | 0 | 8 | 1 | 0.044 | 22.28 | 3.386 | 3.308 | 15.58 | 18 | 3 | 0 | 2 | 9 | 0 | 0 |
| 273 | 0 | 8 | 1 | 0.904 | 23.31 | 3.630 | 3.460 | 16.22 | 7 | 4 | 0 | 7 | 4 | 0 | 1 |
| 294 | 0 | 8 | 1 | 0.680 | 22.38 | 3.838 | 3.568 | 14.97 | 9 | 7 | 4 | 10 | 6 | 1 | 0 |
| 303 | 1 | 9 | 0 | -0.571 | 22.81 | 4.130 | 3.350 | 15.33 | 10 | 1 | 6 | 3 | 6 | 1 | 0 |
| 306 | 1 | 8 | 0 | 0.207 | 23.27 | 3.890 | 3.542 | 15.84 | 14 | 4 | 3 | 3 | 9 | 1 | 0 |
| 319 | 0 | 8 | 1 | 0.455 | 22.15 | 3.462 | 3.570 | 15.12 | 7 | 7 | 1 | 4 | 7 | 0 | 0 |
| 347 | 0 | 8 | 1 | 0.656 | 23.53 | 3.862 | 3.406 | 16.26 | 9 | 4 | 0 | 4 | 7 | 0 | 0 |
| 385 | 0 | 8 | 1 | 1.154 | 22.58 | 3.650 | 3.570 | 15.36 | 7 | 5 | 2 | 6 | 7 | 0 | 1 |
| 387 | 0 | 8 | 0 | 0.211 | 22.49 | 3.862 | 3.482 | 15.14 | 18 | 4 | 2 | 5 | 7 | 0 | 0 |
| 389 | 0 | 8 | 0 | 0.768 | 23.66 | 3.730 | 3.350 | 16.58 | 9 | 4 | 7 | 3 | 9 | 1 | 0 |
| 404 | 0 | 8 | 1 | 0.976 | 22.09 | 3.800 | 3.440 | 14.85 | 5 | 5 | 0 | 10 | 0 | 1 | 1 |
| 443 | 0 | 8 | 0 | 0.600 | 22.74 | 3.664 | 3.422 | 15.65 | 12 | 3 | 0 | 5 | 9 | 0 | 0 |
| 451 | 0 | 8 | 1 | 1.156 | 22.57 | 3.315 | 3.693 | 15.56 | 13 | 4 | 3 | 5 | 9 | 0 | 1 |
| 453 | 0 | 8 | 1 | 0.487 | 22.24 | 3.558 | 3.480 | 15.20 | 11 | 1 | 1 | 4 | 9 | 1 | 0 |
| 459 | 0 | 8 | 0 | 0.685 | 23.44 | 3.890 | 3.570 | 15.98 | 31 | 0 | 0 | 6 | 5 | 1 | 0 |
| 474 | 0 | 8 | 1 | 0.619 | 23.46 | 4.224 | 3.364 | 15.88 | 18 | 7 | 7 | 15 | 4 | 0 | 0 |
| 510 | 1 | 8 | 0 | -0.339 | 22.94 | 3.634 | 3.640 | 15.66 | 22 | 7 | 7 | 3 | 9 | 0 | 1 |
| 514 | 1 | 8 | 0 | 0.269 | 23.11 | 3.680 | 3.333 | 16.10 | 5 | 4 | 5 | 3 | 7 | 1 | 1 |
| 521 | 0 | 8 | 0 | 0.781 | 23.88 | 4.028 | 3.388 | 16.46 | 9 | 2 | 4 | 3 | 0 | 1 | 1 |
| 548 | 0 | 8 | 0 | 0.307 | 23.14 | 4.020 | 3.210 | 15.91 | 9 | 9 | 0 | 7 | 2 | 0 | 1 |
| 552 | 0 | 8 | 1 | 0.725 | 21.84 | 3.502 | 3.454 | 14.89 | 0 | 3 | 5 | 5 | 5 | 1 | 0 |
| 593 | 0 | 8 | 1 | 3.731 | 22.22 | 3.200 | 3.420 | 15.60 | 15 | 3 | 5 | 12 | 10 | 0 | 0 |
| 611 | 1 | 8 | 0 | -0.149 | 22.88 | 3.876 | 3.366 | 15.64 | 23 | 5 | 0 | 2 | 4 | 0 | 1 |
data[['AGE']].groupby('AGE').value_counts()
AGE 5 21 6 456 7 82 8 53 9 6 Name: count, dtype: int64
sns.catplot(data, x = 'STUDYHR', col = 'GENDER', kind = 'count')
<seaborn.axisgrid.FacetGrid at 0x30657dc70>
sns.catplot(data, x = 'MYOPIC', y = 'SPHEQ', hue = 'GENDER', kind = 'box')
<seaborn.axisgrid.FacetGrid at 0x30d9ec800>
sns.pairplot(data, vars = data.loc[:, 'SPHEQ':'TVHR'])
<seaborn.axisgrid.PairGrid at 0x306acfc80>
lm = sm.OLS(data.VCD, sm.add_constant(data.AL))
res = lm.fit()
print(res.summary())
sns.lmplot(x = 'AL', y = 'VCD', data = data, ci = None)
OLS Regression Results
==============================================================================
Dep. Variable: VCD R-squared: 0.887
Model: OLS Adj. R-squared: 0.887
Method: Least Squares F-statistic: 4845.
Date: Tue, 21 Jan 2025 Prob (F-statistic): 4.35e-294
Time: 17:24:17 Log-Likelihood: 50.775
No. Observations: 618 AIC: -97.55
Df Residuals: 616 BIC: -88.70
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -5.3161 0.297 -17.874 0.000 -5.900 -4.732
AL 0.9198 0.013 69.608 0.000 0.894 0.946
==============================================================================
Omnibus: 0.154 Durbin-Watson: 2.113
Prob(Omnibus): 0.926 Jarque-Bera (JB): 0.250
Skew: 0.010 Prob(JB): 0.883
Kurtosis: 2.904 Cond. No. 747.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<seaborn.axisgrid.FacetGrid at 0x30eecc9e0>
il coefficiente angolare è positivo, coerente con lo scatterplot precedente
Scrivere una funzione che prenda in input un numero intero positivo n ≥ 1 e restituisca la somma dei quadrati dei primi n interi positivin k=1 k2, un messaggio di errore se la condizione su n non `e soddisfatta. Scrivere la funzione sia in forma ricorsiva che non.
def func(n):
somma = 0
if n < 1:
return 'errore'
else:
for i in range(1, n+1):
somma += i**2
return somma
print(func(5))
55
def func_r(n):
if n < 1:
return 'errore'
elif n == 1:
return 1
else:
return n**2 + func_r(n-1)
print(func(5))
55
df = pd.read_csv('nutri.csv')
df
| smoking | gender | age | education | weight | height | bmi | |
|---|---|---|---|---|---|---|---|
| 0 | yes | male | 62 | 15+ | 94.8 | 184.5 | 27.8 |
| 1 | yes | male | 53 | 12-13 | 90.4 | 171.4 | 30.8 |
| 2 | yes | male | 78 | 12-13 | 83.4 | 170.1 | 28.8 |
| 3 | no | female | 56 | 15+ | 109.8 | 160.9 | 42.4 |
| 4 | no | female | 42 | 14-15 | 55.2 | 164.9 | 20.3 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 5390 | yes | female | 76 | 12-13 | 59.1 | 165.8 | 21.5 |
| 5391 | no | male | 26 | 15+ | 112.1 | 182.2 | 33.8 |
| 5392 | yes | female | 80 | 14-15 | 71.7 | 152.2 | 31.0 |
| 5393 | yes | male | 35 | <8 | 78.2 | 173.3 | 26.0 |
| 5394 | no | female | 24 | 15+ | 58.3 | 165.0 | 21.4 |
5395 rows × 7 columns
df.dtypes
smoking object gender object age int64 education object weight float64 height float64 bmi float64 dtype: object
iloc = df.iloc[17:27, [1, 4]]
iloc
| gender | weight | |
|---|---|---|
| 17 | female | 59.0 |
| 18 | male | 72.8 |
| 19 | female | 67.7 |
| 20 | female | 77.7 |
| 21 | female | 56.6 |
| 22 | male | 69.0 |
| 23 | female | 87.8 |
| 24 | male | 73.7 |
| 25 | female | 75.6 |
| 26 | male | 102.1 |
loc = df.loc[17:26, ['gender', 'weight']]
loc
| gender | weight | |
|---|---|---|
| 17 | female | 59.0 |
| 18 | male | 72.8 |
| 19 | female | 67.7 |
| 20 | female | 77.7 |
| 21 | female | 56.6 |
| 22 | male | 69.0 |
| 23 | female | 87.8 |
| 24 | male | 73.7 |
| 25 | female | 75.6 |
| 26 | male | 102.1 |
df[(df['smoking'] == 'yes') & (df['height'] > 180)]
| smoking | gender | age | education | weight | height | bmi | |
|---|---|---|---|---|---|---|---|
| 0 | yes | male | 62 | 15+ | 94.8 | 184.5 | 27.8 |
| 30 | yes | male | 56 | <8 | 85.6 | 187.4 | 24.4 |
| 38 | yes | male | 24 | 14-15 | 89.2 | 182.2 | 26.9 |
| 60 | yes | male | 30 | 14-15 | 89.1 | 181.5 | 27.0 |
| 62 | yes | male | 41 | 14-15 | 146.1 | 189.4 | 40.7 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 5277 | yes | male | 29 | 14-15 | 60.1 | 183.0 | 17.9 |
| 5304 | yes | male | 53 | 12-13 | 97.7 | 183.1 | 29.1 |
| 5325 | yes | female | 55 | 14-15 | 123.0 | 181.5 | 37.3 |
| 5352 | yes | male | 48 | 12-13 | 107.9 | 181.5 | 32.8 |
| 5361 | yes | male | 33 | 8-11 | 63.0 | 180.3 | 19.4 |
236 rows × 7 columns
df[['smoking', 'gender', 'weight']].groupby(['gender', 'smoking']).mean()
| weight | ||
|---|---|---|
| gender | smoking | |
| female | no | 75.649094 |
| yes | 79.531342 | |
| male | no | 86.754267 |
| yes | 87.266423 |
sns.displot(data = df, x = 'height', col = 'gender')
<seaborn.axisgrid.FacetGrid at 0x301b0fe90>
sns.catplot(data = df, x = 'education', y = 'bmi', kind = 'box')
<seaborn.axisgrid.FacetGrid at 0x16a7d9e80>
sns.pairplot(df, hue = 'smoking')
<seaborn.axisgrid.PairGrid at 0x304e6bf50>
lm = sm.OLS(df.bmi, sm.add_constant(df.weight))
res = lm.fit()
print(res.summary())
sns.lmplot(x = 'weight', y = 'bmi', data = df, ci = None)
OLS Regression Results
==============================================================================
Dep. Variable: bmi R-squared: 0.778
Model: OLS Adj. R-squared: 0.778
Method: Least Squares F-statistic: 1.891e+04
Date: Tue, 21 Jan 2025 Prob (F-statistic): 0.00
Time: 17:48:12 Log-Likelihood: -14152.
No. Observations: 5395 AIC: 2.831e+04
Df Residuals: 5393 BIC: 2.832e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 6.1087 0.176 34.637 0.000 5.763 6.454
weight 0.2868 0.002 137.511 0.000 0.283 0.291
==============================================================================
Omnibus: 287.794 Durbin-Watson: 2.026
Prob(Omnibus): 0.000 Jarque-Bera (JB): 374.589
Skew: 0.518 Prob(JB): 4.56e-82
Kurtosis: 3.771 Cond. No. 329.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<seaborn.axisgrid.FacetGrid at 0x305104f80>
il coefficiente è positivo coerente con lo scatterplot del punto precedente
Quando si scrive un codice, i tempi di esecuzione sono molto importanti e diversi linguaggi di program- mazione li restituiscono in secondi. Scrivere una funzione che prenda in input un intero positivo, che rappresenti il tempo di esecuzione di un programma espresso in secondi, e lo converta in ore, minuti e secondi. La funzione deve restituire un messaggio di errore nel caso in cui il numero inserito sia negativo. Ad esempio, se il numero in input fosse 7622, questo corrisponderebbe a 2 ore, 7 minuti e 2 secondi.
def time(n):
if n < 0:
return 'errore'
ore = n//3600
secondi_rimanenti = n%3600
minuti = secondi_rimanenti//60
secondi = n%60
return print(f'{ore} ore, {minuti} minuti e {secondi} secondi')
time(7622)
2 ore, 7 minuti e 2 secondi
os.getcwd()
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
os.chdir('/Users/ludovicavargiu/Downloads')
ais = pd.read_csv('AIS.csv')
ais
| Sex | Sport | LBM | Ht | Wt | BMI | SSF | RBC | WBC | HCT | HGB | Ferr | PBF | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | F | BBall | 63.32 | 195.9 | 78.9 | 20.56 | 109.1 | 3.96 | 7.5 | 37.5 | 12.3 | 60 | 19.75 |
| 1 | F | BBall | 58.55 | 189.7 | 74.4 | 20.67 | 102.8 | 4.41 | 8.3 | 38.2 | 12.7 | 68 | 21.30 |
| 2 | F | BBall | 55.36 | 177.8 | 69.1 | 21.86 | 104.6 | 4.14 | 5.0 | 36.4 | 11.6 | 21 | 19.88 |
| 3 | F | BBall | 57.18 | 185.0 | 74.9 | 21.88 | 126.4 | 4.11 | 5.3 | 37.3 | 12.6 | 69 | 23.66 |
| 4 | F | BBall | 53.20 | 184.6 | 64.6 | 18.96 | 80.3 | 4.45 | 6.8 | 41.5 | 14.0 | 29 | 17.64 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 197 | M | WPolo | 82.00 | 183.9 | 93.2 | 27.56 | 67.2 | 4.90 | 7.6 | 45.6 | 16.0 | 90 | 11.79 |
| 198 | M | Tennis | 72.00 | 183.5 | 80.0 | 23.76 | 56.5 | 5.66 | 8.3 | 50.2 | 17.7 | 38 | 10.05 |
| 199 | M | Tennis | 68.00 | 183.1 | 73.8 | 22.01 | 47.6 | 5.03 | 6.4 | 42.7 | 14.3 | 122 | 8.51 |
| 200 | M | Tennis | 63.00 | 178.4 | 71.1 | 22.34 | 60.4 | 4.97 | 8.8 | 43.0 | 14.9 | 233 | 11.50 |
| 201 | M | Tennis | 72.00 | 190.8 | 76.7 | 21.07 | 34.9 | 5.38 | 6.3 | 46.0 | 15.7 | 32 | 6.26 |
202 rows × 13 columns
ais.dtypes
Sex object Sport object LBM float64 Ht float64 Wt float64 BMI float64 SSF float64 RBC float64 WBC float64 HCT float64 HGB float64 Ferr int64 PBF float64 dtype: object
iloc = ais.iloc[8:16, [2, 9]]
iloc
| LBM | HCT | |
|---|---|---|
| 8 | 54.57 | 41.1 |
| 9 | 53.42 | 41.6 |
| 10 | 68.53 | 41.4 |
| 11 | 61.85 | 43.8 |
| 12 | 48.32 | 41.4 |
| 13 | 66.24 | 41.0 |
| 14 | 57.92 | 43.7 |
| 15 | 56.52 | 40.3 |
loc = ais.loc[8:15, ['LBM', 'HCT']]
loc
| LBM | HCT | |
|---|---|---|
| 8 | 54.57 | 41.1 |
| 9 | 53.42 | 41.6 |
| 10 | 68.53 | 41.4 |
| 11 | 61.85 | 43.8 |
| 12 | 48.32 | 41.4 |
| 13 | 66.24 | 41.0 |
| 14 | 57.92 | 43.7 |
| 15 | 56.52 | 40.3 |
ais[(ais['Sex'] == 'M') & (ais['BMI'] > 25)]
| Sex | Sport | LBM | Ht | Wt | BMI | SSF | RBC | WBC | HCT | HGB | Ferr | PBF | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 107 | M | Swim | 78.0 | 184.0 | 85.00 | 25.11 | 52.3 | 4.75 | 8.6 | 45.5 | 15.2 | 99 | 8.54 |
| 109 | M | Swim | 81.0 | 187.2 | 92.00 | 26.25 | 65.3 | 4.87 | 4.8 | 44.9 | 15.4 | 124 | 11.72 |
| 112 | M | Swim | 91.0 | 190.4 | 96.90 | 26.73 | 35.2 | 4.32 | 4.3 | 41.6 | 14.0 | 177 | 6.46 |
| 114 | M | Rowing | 75.0 | 181.8 | 85.40 | 25.84 | 61.8 | 5.04 | 7.1 | 44.0 | 14.8 | 64 | 12.61 |
| 117 | M | Rowing | 78.0 | 186.0 | 86.80 | 25.09 | 60.2 | 4.78 | 9.3 | 43.0 | 14.7 | 150 | 10.05 |
| 119 | M | Rowing | 79.0 | 185.6 | 87.20 | 25.31 | 44.5 | 5.22 | 8.4 | 47.5 | 16.2 | 89 | 9.36 |
| 121 | M | Rowing | 82.0 | 185.6 | 89.80 | 26.07 | 44.7 | 5.40 | 6.8 | 49.5 | 17.3 | 183 | 8.61 |
| 122 | M | Rowing | 82.0 | 189.0 | 91.10 | 25.50 | 64.9 | 4.92 | 5.4 | 46.2 | 15.8 | 84 | 9.53 |
| 124 | M | Rowing | 83.0 | 185.6 | 92.30 | 26.79 | 58.3 | 5.09 | 10.1 | 44.9 | 14.8 | 118 | 9.79 |
| 125 | M | Rowing | 88.0 | 194.6 | 97.00 | 25.61 | 52.8 | 4.83 | 5.0 | 43.8 | 15.1 | 61 | 8.97 |
| 126 | M | Rowing | 83.0 | 189.0 | 89.50 | 25.06 | 43.1 | 5.22 | 6.0 | 46.6 | 15.7 | 72 | 7.49 |
| 132 | M | BBall | 97.0 | 209.4 | 113.70 | 25.93 | 88.9 | 5.17 | 8.0 | 47.9 | 16.4 | 36 | 14.53 |
| 134 | M | BBall | 90.0 | 198.7 | 100.20 | 25.38 | 61.8 | 4.50 | 9.2 | 40.7 | 13.7 | 72 | 10.64 |
| 144 | M | Field | 88.0 | 185.1 | 102.70 | 29.97 | 71.1 | 5.09 | 8.9 | 46.3 | 15.4 | 44 | 13.97 |
| 145 | M | Field | 83.0 | 185.5 | 94.25 | 27.39 | 65.9 | 5.11 | 9.6 | 48.2 | 16.7 | 103 | 11.66 |
| 157 | M | TSprnt | 75.0 | 178.5 | 80.20 | 25.17 | 30.3 | 4.88 | 4.3 | 45.6 | 15.5 | 80 | 6.76 |
| 159 | M | Field | 102.0 | 185.0 | 111.30 | 32.52 | 55.7 | 5.48 | 4.6 | 49.4 | 18.0 | 132 | 8.51 |
| 161 | M | Field | 78.0 | 180.1 | 97.90 | 30.18 | 112.5 | 5.01 | 8.9 | 46.0 | 15.9 | 212 | 19.94 |
| 162 | M | Field | 106.0 | 189.2 | 123.20 | 34.42 | 82.7 | 5.48 | 6.2 | 48.2 | 16.3 | 94 | 13.91 |
| 175 | M | TSprnt | 86.0 | 189.1 | 94.80 | 26.51 | 52.8 | 5.50 | 6.4 | 48.1 | 16.5 | 40 | 9.40 |
| 177 | M | Field | 89.0 | 179.1 | 108.20 | 33.73 | 113.5 | 4.96 | 8.3 | 45.3 | 15.7 | 141 | 17.41 |
| 178 | M | Field | 80.0 | 180.1 | 97.90 | 30.18 | 96.9 | 5.01 | 8.9 | 46.0 | 15.9 | 212 | 18.08 |
| 181 | M | WPolo | 77.0 | 192.7 | 94.20 | 25.37 | 96.3 | 4.63 | 9.1 | 42.1 | 14.4 | 126 | 18.72 |
| 184 | M | WPolo | 71.0 | 182.7 | 86.20 | 25.82 | 100.7 | 5.34 | 10.0 | 46.8 | 16.2 | 94 | 17.24 |
| 188 | M | WPolo | 85.0 | 192.6 | 93.50 | 25.21 | 47.8 | 5.01 | 9.8 | 46.5 | 15.8 | 97 | 8.87 |
| 191 | M | WPolo | 86.0 | 193.9 | 101.00 | 26.86 | 75.6 | 5.08 | 8.5 | 46.3 | 15.6 | 117 | 14.98 |
| 193 | M | WPolo | 79.0 | 185.3 | 87.30 | 25.43 | 49.5 | 4.63 | 14.3 | 44.8 | 15.0 | 133 | 8.97 |
| 195 | M | WPolo | 82.0 | 184.6 | 94.70 | 27.79 | 75.7 | 5.34 | 6.2 | 49.8 | 17.2 | 143 | 13.49 |
| 197 | M | WPolo | 82.0 | 183.9 | 93.20 | 27.56 | 67.2 | 4.90 | 7.6 | 45.6 | 16.0 | 90 | 11.79 |
ais[['Wt', 'Sport']].groupby('Sport').mean()
| Wt | |
|---|---|
| Sport | |
| BBall | 79.776000 |
| Field | 89.971053 |
| Gym | 43.625000 |
| Netball | 69.593478 |
| Rowing | 78.537838 |
| Swim | 75.145455 |
| T400m | 64.046552 |
| TSprnt | 71.506667 |
| Tennis | 64.472727 |
| WPolo | 86.729412 |
sns.displot(ais, x = 'HGB', col = 'Sex')
<seaborn.axisgrid.FacetGrid at 0x3027f3f50>
sns.pairplot(ais, vars = ais.iloc[:, 2: 7], hue = 'Sex')
<seaborn.axisgrid.PairGrid at 0x3052b7f50>
lm = sm.OLS(ais.Wt, sm.add_constant(ais.Ht))
res = lm.fit()
print(res.summary())
sns.lmplot(x = 'Ht', y = 'Wt', data = ais, ci = None)
OLS Regression Results
==============================================================================
Dep. Variable: Wt R-squared: 0.610
Model: OLS Adj. R-squared: 0.608
Method: Least Squares F-statistic: 312.6
Date: Tue, 21 Jan 2025 Prob (F-statistic): 9.64e-43
Time: 18:15:57 Log-Likelihood: -723.08
No. Observations: 202 AIC: 1450.
Df Residuals: 200 BIC: 1457.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -126.1890 11.397 -11.073 0.000 -148.662 -103.716
Ht 1.1171 0.063 17.680 0.000 0.993 1.242
==============================================================================
Omnibus: 57.269 Durbin-Watson: 1.580
Prob(Omnibus): 0.000 Jarque-Bera (JB): 136.811
Skew: 1.266 Prob(JB): 1.96e-30
Kurtosis: 6.137 Cond. No. 3.35e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.35e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
<seaborn.axisgrid.FacetGrid at 0x1750803b0>
il coefficiente angolare 1.1171 è positivo, coerente con lo scatterplot precedente
#sostituire nan a venti posizioni casuali della colonna Wt
c = ais.copy()['Wt']
pos = random.sample(list(range(0, 202)), k = 20)
c.loc[pos] = np.nan
c.info()
<class 'pandas.core.series.Series'> RangeIndex: 202 entries, 0 to 201 Series name: Wt Non-Null Count Dtype -------------- ----- 182 non-null float64 dtypes: float64(1) memory usage: 1.7 KB
c2 = c.fillna(c.mean())
c2.info()
<class 'pandas.core.series.Series'> RangeIndex: 202 entries, 0 to 201 Series name: Wt Non-Null Count Dtype -------------- ----- 202 non-null float64 dtypes: float64(1) memory usage: 1.7 KB
c2[pos]
59 75.404396 168 75.404396 137 75.404396 65 75.404396 153 75.404396 34 75.404396 57 75.404396 86 75.404396 30 75.404396 149 75.404396 113 75.404396 28 75.404396 114 75.404396 73 75.404396 43 75.404396 55 75.404396 64 75.404396 7 75.404396 115 75.404396 54 75.404396 Name: Wt, dtype: float64
c2.to_csv('random.csv')
c2_csv = pd.read_csv('random.csv')
c2_csv
| Unnamed: 0 | Wt | |
|---|---|---|
| 0 | 0 | 78.9 |
| 1 | 1 | 74.4 |
| 2 | 2 | 69.1 |
| 3 | 3 | 74.9 |
| 4 | 4 | 64.6 |
| ... | ... | ... |
| 197 | 197 | 93.2 |
| 198 | 198 | 80.0 |
| 199 | 199 | 73.8 |
| 200 | 200 | 71.1 |
| 201 | 201 | 76.7 |
202 rows × 2 columns
print(c2.to_string())
0 78.900000 1 74.400000 2 69.100000 3 74.900000 4 64.600000 5 63.700000 6 75.200000 7 75.404396 8 66.500000 9 62.900000 10 96.300000 11 75.500000 12 63.000000 13 80.500000 14 71.300000 15 70.500000 16 73.200000 17 68.700000 18 80.500000 19 72.900000 20 74.500000 21 75.400000 22 69.500000 23 66.400000 24 79.700000 25 73.600000 26 78.700000 27 75.000000 28 75.404396 29 67.200000 30 75.404396 31 74.300000 32 78.100000 33 79.500000 34 75.404396 35 59.900000 36 63.000000 37 66.300000 38 60.700000 39 72.900000 40 67.900000 41 67.500000 42 74.100000 43 75.404396 44 68.800000 45 75.300000 46 67.400000 47 70.000000 48 74.000000 49 51.900000 50 74.100000 51 74.300000 52 77.800000 53 66.900000 54 75.404396 55 75.404396 56 64.100000 57 75.404396 58 64.800000 59 75.404396 60 72.100000 61 75.600000 62 71.400000 63 69.700000 64 75.404396 65 75.404396 66 60.000000 67 58.000000 68 64.700000 69 87.500000 70 78.900000 71 83.900000 72 82.800000 73 75.404396 74 94.800000 75 49.200000 76 61.900000 77 53.600000 78 63.700000 79 52.800000 80 65.200000 81 50.900000 82 57.300000 83 60.000000 84 60.100000 85 52.500000 86 75.404396 87 57.300000 88 59.600000 89 71.500000 90 69.700000 91 56.100000 92 61.100000 93 47.400000 94 56.000000 95 45.800000 96 47.800000 97 43.800000 98 37.800000 99 45.100000 100 67.000000 101 74.400000 102 79.300000 103 87.500000 104 83.500000 105 78.000000 106 78.000000 107 85.000000 108 84.700000 109 92.000000 110 72.300000 111 83.000000 112 96.900000 113 75.404396 114 75.404396 115 75.404396 116 93.500000 117 86.800000 118 87.900000 119 87.200000 120 53.800000 121 89.800000 122 91.100000 123 88.600000 124 92.300000 125 97.000000 126 89.500000 127 88.200000 128 92.200000 129 78.900000 130 90.300000 131 87.000000 132 113.700000 133 98.000000 134 100.200000 135 79.400000 136 90.300000 137 75.404396 138 83.900000 139 75.500000 140 60.600000 141 71.000000 142 71.800000 143 76.800000 144 102.700000 145 94.250000 146 79.000000 147 66.600000 148 71.800000 149 75.404396 150 68.200000 151 62.300000 152 61.000000 153 75.404396 154 57.400000 155 71.400000 156 70.300000 157 80.200000 158 84.200000 159 111.300000 160 80.700000 161 97.900000 162 123.200000 163 72.900000 164 83.000000 165 75.900000 166 70.700000 167 67.100000 168 75.404396 169 67.050000 170 70.500000 171 70.800000 172 71.000000 173 69.100000 174 62.900000 175 94.800000 176 94.600000 177 108.200000 178 97.900000 179 75.200000 180 74.800000 181 94.200000 182 76.100000 183 94.700000 184 86.200000 185 79.600000 186 85.300000 187 74.400000 188 93.500000 189 87.600000 190 85.400000 191 101.000000 192 74.900000 193 87.300000 194 90.000000 195 94.700000 196 76.300000 197 93.200000 198 80.000000 199 73.800000 200 71.100000 201 76.700000
ais['bmi_qcut'] = pd.qcut(ais['BMI'], q = 4)
ais
| Sex | Sport | LBM | Ht | Wt | BMI | SSF | RBC | WBC | HCT | HGB | Ferr | PBF | bmi_qcut | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | F | BBall | 63.32 | (186.175, 209.4] | 78.9 | 20.56 | 109.1 | 3.96 | 7.5 | 37.5 | 12.3 | 60 | 19.75 | (16.749, 21.082] |
| 1 | F | BBall | 58.55 | (186.175, 209.4] | 74.4 | 20.67 | 102.8 | 4.41 | 8.3 | 38.2 | 12.7 | 68 | 21.30 | (16.749, 21.082] |
| 2 | F | BBall | 55.36 | (174.0, 179.7] | 69.1 | 21.86 | 104.6 | 4.14 | 5.0 | 36.4 | 11.6 | 21 | 19.88 | (21.082, 22.72] |
| 3 | F | BBall | 57.18 | (179.7, 186.175] | 74.9 | 21.88 | 126.4 | 4.11 | 5.3 | 37.3 | 12.6 | 69 | 23.66 | (21.082, 22.72] |
| 4 | F | BBall | 53.20 | (179.7, 186.175] | 64.6 | 18.96 | 80.3 | 4.45 | 6.8 | 41.5 | 14.0 | 29 | 17.64 | (16.749, 21.082] |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 197 | M | WPolo | 82.00 | (179.7, 186.175] | 93.2 | 27.56 | 67.2 | 4.90 | 7.6 | 45.6 | 16.0 | 90 | 11.79 | (24.465, 34.42] |
| 198 | M | Tennis | 72.00 | (179.7, 186.175] | 80.0 | 23.76 | 56.5 | 5.66 | 8.3 | 50.2 | 17.7 | 38 | 10.05 | (22.72, 24.465] |
| 199 | M | Tennis | 68.00 | (179.7, 186.175] | 73.8 | 22.01 | 47.6 | 5.03 | 6.4 | 42.7 | 14.3 | 122 | 8.51 | (21.082, 22.72] |
| 200 | M | Tennis | 63.00 | (174.0, 179.7] | 71.1 | 22.34 | 60.4 | 4.97 | 8.8 | 43.0 | 14.9 | 233 | 11.50 | (21.082, 22.72] |
| 201 | M | Tennis | 72.00 | (186.175, 209.4] | 76.7 | 21.07 | 34.9 | 5.38 | 6.3 | 46.0 | 15.7 | 32 | 6.26 | (16.749, 21.082] |
202 rows × 14 columns